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Abstract 

A  model  of  profile  analysis  is  proposed  in  which  a  spectral  profile  is  assumed  to 
be  represented  by  a  weighted  sum  of  sinusoidally  modulated  spectra  (ripples).  The 
analysis  is  performed  by  a  bank  of  bandpass  filters,  each  tuned  to  a  particular  ripple 
frequency  and  ripple  phase.  The  parameters  of  the  model  are  estimated  using  data 
from  ripple  detection  experiments  in  [Green,  1986;  Hillier,  1991].  Detection  thresh¬ 
olds  are  computed  from  the  filter  outputs  and  compared  with  perceptual  thresholds, 
for  profile  detection  experiments  with  step,  single  component  increment,  and  the 
alternating  profiles.  The  model  accounts  well  for  the  measured  thresholds  in  these 
experiments.  Physiological  and  psychophysical  evidences  from  the  auditory  and  vi¬ 
sual  systems  in  support  .of  this  type  of  a  model  are  also  reviewed.  The  implications 
of  this  model  for  pitch  and  timbre  perception  are  briefly  discussed. 

INTRODUCTION 

In  characterizing  the  perception  of  spectral  profiles,  a  basic  objective  is  to  select  a 
model  representation  upon  which  an  appropriate  metric  can  be  defined.  Several  such 
models  have  been  proposed  to  account  for  data  from  a  wide  range  of  psychoacoustical 
tests  -  including  profile  analysis  experiments  and  discriminations  of  simultaneous 
vowels.  Examples  are  the  independent  channels  model  [Durlach,  Braida  and  Ito, 
1986],  the  maximum  difference  model  [Bernstein  and  Green ,  1987],  the  Ewaif  model 
[Feth,  O’Malley  and  Ramsey,  1982],  the  weighted  slope  model  [Klatt,  1982],  and 
the  spectral  peak  model  [Assmann  and  Summerfield,  1989].  Despite  their  unique 
characteristics,  all  models  share  the  same  fundamental  starting  point  that  the  spec¬ 
tral  profile  is  represented  by  the  acoustic  spectrum  on  a  logarithmic  frequency  axis. 
Relative  to  this  profile,  various  operations  are  defined  to  predict  the  perceptual 
thresholds. 

In  this  paper,  an  alternative  model  of  profile  analysis  is  proposed  which  does 
not  operate  upon  the  profile  directly,  but  rather  upon  its  Fourier  transformation. 
Specifically,  it  is  hypothesized  that  an  arbitrary  profile  is  represented  in  the  auditory 
system  by  a  collection  of  weighted  elementary  sinusoidal  profiles  (ripples)  of  different 
frequencies  and  phases.  Such  a  ripple  analysis  would  be  effectively  accomplished 
via  a  bank  of  filters  tuned  to  different  ripple  frequencies  and  phases.  Detection 
thresholds  would  then  be  computed  from  this  representation  of  the  profile. 

The  primary  motivations  for  such  a  ripple  analysis  model  were  findings  from 
physiological  mappings  in  the  primary  auditory  cortex,  AI  [Schreiner  and.  Mendel- 
son,  1990;  Shamma  et  al,  1993].  The  results  revealed  that  AI  potentially  encodes, 
at  each  point  along  the  tonotopic  axis,  an  explicit  measure  of  the  local  bandwidth 
and  asymmetry  of  the  acoustic  spectrum.  A  broader  interpretation  of  these  two 
response  measures  (as  we  shall  elaborate  below)  led  to  the  notion  that  they  may, 
respectively,  correspond  to  the  magnitude  and  phase  of  a  Fourier  transformation 
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of  the  profile.  Recent  neurophysiological  results  in  AI  are  consistent  with  this  hy¬ 
pothesis  in  that  cortical  cells  are  tuned  to  specific  (characteristic)  ripple  frequencies 
and  phases  [ Calhoun  and  Schreiner ,  1993;  Shamma,  Versnel  and  Kowalski ,  1993]. 
Furthermore,  these  two  response  properties  are  columnarly  organized  and  topo¬ 
graphically  mapped  along  the  isofrequency  planes  in  a  manner  similar  to  that  of  the 
response  area  bandwidths  and  asymmetry  eluded  to  above. 

The  idea  that  the  brain  analyzes  and  perceives  its  sensory  patterns  in  this  man¬ 
ner  is  relatively  common  in  the  vision  literature  where  it  has  been  variously  called 
multi-resolution  or  multi-scale  representation,  and  spatial  frequency  analysis  [Camp¬ 
bell  and  Robson ,  1968;  Levine,  1985].  It  is,  however,  the  elegant  anatomical  and 
physiological  work  of  [DeValois  and  DeValois ,  1990]  that  has  provided  the  most 
immediate  inspiration  to  pursue  this  type  of  model  for  the  auditory  system. 

In  the  psychoacoustical  literature,  it  is  a  curious  fact  that  the  first  doubts  about 
the  competence  of  the  independent  channels  model  were  raised  for  ripple  profile 
stimuli  [Green,  1986],  Specifically,  the  detection  thresholds  were  found  to  be  re¬ 
markably  constant  and  relatively  high  compared  to  what  would  be  expected  from 
the  channel  model.  Furthermore,  the  thresholds  could  well  be  accounted  for  by  the 
maximum  difference  model  [Bernstein  and  Green,  1987]  which  assumes  only  two 
independent  channels  of  information. 

These  issues  will  be  examined  here  in  the  context  of  an  auditory  ripple  analysis 
model.  Essential  to  the  development  of  such  a  model  are  basic  psychoacoustical 
sensitivity  measurements  with  simple  ripple  profiles.  A  few  such  experiments  have 
already  been  performed  [Green,  1986;  Hillier,  1991;  Houtgast  and  Veen,  1982].  Using 
these  data  and  the  conceptual  framework  outlined  above,  an  explicit  computational 
model  is  developed  to  interpret  the  results  of  various  profile  analysis  experiments. 

In  the  following  sections,  we  first  outline  the  computational  model  (Sec.  I).  Its 
various  parameters  are  estimated  in  Secs.  I  C  and  II  from  experimental  results 
reported  here  and  in  [Green,  1986;  Hillier,  1991].  The  model  is  used  in  Sec.  Ill 
to  predict  the  detection  thresholds  for  several  profile  analysis  experiments.  Finally, 
the  model  is  discussed  in  contrast  to  other  profile  analysis  prediction  models  and  in 
relation  to  auditory  percepts  such  as  timbre  and  pitch. 

A.  Terminology  and  notation 

The  following  terms  are  frequently  used  here  to  describe  the  ripple  analysis  rep¬ 
resentation  of  profiles: 

Ripple:  refers  to  a  sinusoidal  spectral  profile  (e.g.,  p(u>)  =  sin(27rflu;  +  0))  on  the 
logarithmic  frequency  axis  u.  A  ripple  has  a  ripple  frequency  0  (in  cycle/octave) 
and  a  ripple  phase  6  (in  radians  or  degrees). 

Ripple  spectrum,  P(Cl):  refers  strictly  to  the  Fourier  transform  of  the  profile 

p(u>). 

Ripple  analysis  filter,  i7(ff;  flc,  $Q):  refers  to  a  bandpass  filter  that  is  tuned 
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around  a  characteristic  ripple  frequency  (fi0)  and  phase  ($0). 

Ripple  transform ,  r(-):  refers  to  the  output  of  a  bank  of  ripple  analysis  filters. 

I.  GENERAL  DESCRIPTION  OF  THE  RIPPLE  ANALYSIS  MODEL 


The  ripple  analysis  model  can  be  conceptually  divided  into  two  stages.  The  first 
(Sec.  I  A)  is  a  ripple  transformation  stage  which  simply  converts  the  input  spectral 
profile  p(u )  into  its  corresponding  ripple  transform  r(-).  The  second  stage  (Sec.  II)  is 
a  detection  model  which  operates  on  the  magnitude  or  phase  of  the  ripple  transform, 
or  on  selected  features  of  it  such  as  its  maxima  and  edges. 

A.  Computing  the  ripple  transform  of  a  spectral  profile 

This  stage  consists  of  a  bank  of  ripple  selective  filters  analogous  to  the  frequency 
selective  filters  of  the  cochlea  (Figs.  1).  The  impulse  response  of  each  filter,  h(u  — 
u>0;  Cta,  $0)  (Fig.  1(a)),  is  centered  around  uj0  and  is  assumed  to  be  a  Gaussian  shaped 
ripple  of  a  particular  ripple  frequency  and  ripple  phase  $0  ([Gabor,  1946]).  In 
particular,  the  filters  centered  around  u0  =  0  are  defined  as: 


h(u;  fl0,  $0)  =  2  g0(u>)  cos(2tC10uj  -  $„),  (1) 

where  g0( cj)  =  \f2sKO  e~L~ T*  ,  and  cr  determines  the  width  of  the  Gaussian  envelope. 
In  the  Fourier  transform  domain,  these  filters  are  Gaussian  shaped  (Fig.  1(b)  and 
(c))  and: 

tf(0;  0o,  $0)  =  H(Q]  Sl„)  cos($0)  +  B(Q;  Sl0)  sin($0) 

=  H(fl;Q0)  e~ 3  s*'5n(n^°, 

where  if(fl;fl0)  is  the  Hilbert  transform  of  i?(H;  fl0)  and  if(H;0o)  is  the  Fourier 
transform  of  2  g0(u )  cos(27rH0u;).  Note  that  H( H;  fl0)  (H(Cl]  0o))  is  pure  real  (imag¬ 
inary). 

In  general,  for  each  fl0  and  3>0  there  is  a  whole  range  of  (identical)  filters  which 
are  centered  at  different  u>0’s  along  the  tonotopic  u  axis.  Therefore,  each  impulse 
response  is  characterized  by  three  parameters:  00,<J>0,  and  ui0,  and  is  given  by: 

h(u>;Q,0,<S>0,u0)  =  h(u  -  luo;0,o,$o). 

The  ripple  transform  at  some  u>0  is  evaluated  from  the  convolution  of  the  impulse 
response  and  the  input  profile.  For  an  arbitrary  (real)  input  profile  p(u>)  with  ripple 
spectrum  -P(fi)  =  |P(fl)|e^n\  the  response  is: 

/OO 

h(u0  -  uj-,Q,0,$0)p(u)du  (2) 


/OO 

H(n-  n0,  $o)P(0)e+j27rna,°dn 

-00 

TOO 

=  2  /  H( H; fl0)  |F(0)|  •  cos($0  —  8(U)  — 

Jo 
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h((o-(o0;  Ho,  «b) 


to,  octave 


H(Q;1)  H(fl;3)  H(Q;8) 


Q  cycle/octave 


Q,  cycle/octave 


Figure  1:  (a)  Impulse  responses  of  three  filters  with  characteristic  ripple  frequen¬ 
cies  fl0  =  1,3,  and  8  cycle/octave,  and  characteristic  phase  4>0  =  0.  Filters  are 
centered  at  u>0  =  0  octave.  The  impulse  response  is  computed  as  h(u> ;  fl0)  = 

2\/27rcr(Q,0  )e  *  2  o))  cos(27r00u;)  for  er(0o)  =  0.3  fl0.  (b)  Fourier  transform  of 

the  three  impulse  responses  of  the  filters  in  (a)  plotted  on  a  linear  fi  axis,  (c)  Same 
as  (b)  but  plotted  on  a  logarithmic  0  axis. 
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This  equation  can  be  simplified  for  profiles  which  are  even  or  odd  symmetric  around 
their  center  (arbitrarily  designated  at  uj0),  or  otherwise  have  a  constant  phase  90 
around  it  so  that  0(fl)  =  sign(tt)0o  —  27rf2a;0.  This  assumption  applies  approxi¬ 
mately  to  all  profiles  discussed  here  and  in  the  companion  paper  [  Vranic-Sowers 
and  Shamma ,  19xx].  The  ripple  transform  computed  at  such  a  point  u>0  is: 

/oo 

H(n-  a,)  \p(n)\dn  =  cos($0  -  e0)  r(ne),  (3) 

-00 

where  r(flD)  is  a  “magnitude”  part: 

r(Q0)=  r  H( n-,n0)  \p(tt)\dn,  (4) 

J —00 

and  $0  — 0O  is  a  “phase”  part  of  the  ripple  transform.  The  majority  of  profile  analysis 
tasks  reported  in  the  literature  and  considered  here  can  be  effectively  described  either 
as  a  change  in  the  magnitude  or  in  the  phase  of  the  ripple  transform,  as  shown  in 
Sec.  II. 

B.  The  representation  of  the  input  spectral  profile 

It  is  uncertain  whether  the  auditory  system  represents  its  input  spectral  profile 
on  a  linear  or  a  logarithmic  amplitude  scale  [Hillier,  1991;  Shannon ,  1992].  For  the 
model,  the  input  profile  p(u ;)  is  taken  to  be  the  linear  amplitude  spectrum  normalized 
by  the  amplitude  of  the  base.  Like  the  logarithmic  spectrum,  this  representation 
is  scale-normalized  preserving  only  the  level-independent  features  of  the  spectrum. 
Other  possible  inputs  range  from  a  simple  scale-normalized  power  spectrum  to  more 
complicated  biologically  and  psychoacoustically  inspired  representations,  such  as  the 
excitation  pattern  model  [Glasberg  and  Moore,  1990]  and  the  auditory  filter  models 
of  [Hillier,  1991;  Patterson,  1986;  Shamma  et  al.,  1986;  Slaney  and  Lyon ,  1990; 
Yang,  Wang  and  Shamma,  1992], 

In  general,  an  inappropriate  profile  representation  distorts  its  intended  ripple 
spectrum  P(fl).  In  some  cases  the  distortions  are  small,  as  demonstrated  in  Fig.  1(d) 
for  spectral  peaks  or  as  in  the  case  of  small  amplitude  ripples  where  linear  and 
logarithmic  spectra  look  very  similar  and  their  perceptual  thresholds  are  closely 
matched  [Green,  1986;  Hillier,  1991;  Houtgast  and  Veen,  1982].  In  other  cases,  the 
distortion  is  large  but  inconsequential  in  the  context  of  the  ripple  analysis  model  as 
will  be  discussed  in  more  detail  in  Sec.  IV  A.l. 

C.  Parameters  of  the  filter  bank 

The  filter  bank  depicted  in  Figs.  1(b)  and  (c)  is  assumed  to  be  a  constant-Q 
bank,  i.e.,  its  filters  have  constant  widths  on  a  logarithmic  0  axis  or,  equivalently, 
have  linearly  increasing  widths  (cr’s)  with  fl0.  This  choice  is  primarily  justified  by 
data  from  adaptation  experiments  (both  with  ripples  [Hillier,  1991]  and  visual  grat¬ 
ings  [DeValois  and  DeValois,  1990])  and  neurophysiological  experiments  [DeValois, 
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0.66 


0.22 


Figure  1:  (d)  Three  input  representations  of  a  symmetric  peak  profile  (left)  and  the 
corresponding  ripple  spectra  magnitudes  (right).  There  are  little  differences  between 
the  three  representations  or  their  ripple  spectra.  The  solid  line  is  the  normalized  (by 
the  base)  linear  representation  of  the  peak  (right  ordinate).  The  dotted  line  is  the 
same  peak  profile  represented  on  a  logarithmic  amplitude  scale  (left  ordinate).  The 
dashed  line  depicts  the  output  of  the  excitation  pattern  model  (no  corrections  were 
applied  in  the  model,  and  the  base  was  0  dB  amplitude;  see  [ Glasberg  and  Moore , 
1990]  for  details). 

Albrecht  and  Thorell ,  1982;  Shamma,  Versnel  and  Kowalski,  1993],  in  which  filter 
tuning  was  estimated  around  various  fi0’s  to  be  around  1  octave  (measured  at  the 
half  amplitude  points).  This  corresponds  approximately  to  cr(fl0)  =  0.3  f l0. 

A  fundamental  consequence  of  the  constant- Q  property  of  the  filters  is  that, 
a  dilation  (or  a  stretching)  of  the  input  profile  (around  u0)  by  a  factor  a ,  i.e., 
p(u)  — *  p(a  u)  or  T’(fl)  — >  1  /aP(Q,/ot)  causes  only  a  simple  translation  of  r(fl0) 
by  log2  a  octaves  along  the  logarithmic  axis,  leaving  the  shape  of  the  ripple 
transform  unaltered1.  This  property  is  illustrated  in  Figs.  2  for  the  ripple  input  at 
Oi  =  2  cycle/octave  and  its  dilated  version  ( a  =  1.5)  at  fl2  =  3  cycle/octave.  The 
corresponding  magnitudes  of  the  ripple  transforms  are  identical  apart  from  a  0.58 
octave  shift  (Fig.  2(c)).  Note  also  that,  a  pure  input  dilation  leaves  the  response  as 
a  function  of  unaltered,  i.e.,  it  is  largest  at  $0  =  0o  as  before. 

xTo  see  this,  consider  the  effect  on  the  ripple  transform  r(  )  of  dilating  its  input  by  a  factor  a. 
The  response  becomes: 

/OO 

tf(Q;fi0)|P(n/a)|dfi/a. 

•OO 

Evaluating  r(  )  at  Q„/a  =  Q'0  and  letting  Q/a  =  fl',  we  get: 

/OO 

■oo 

which  is  identical  in  shape  to  r(f20,  <J>0)  prior  to  dilation,  except  for  a  translation  to  Sl'0. 
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ft,  cycle/octave 


fto,  cycle/octave 


Figure  2:  (a)  Two  ripple  profiles  with  amplitudes  0.1  and  frequencies  fti  =  2  cy¬ 
cle/octave  and  ft2  =  3  cycle/octave,  (b)  Ripple  spectra  magnitudes  corresponding 
to  the  two  ripple  profiles,  (c)  Ripple  transform  magnitudes  r(ft0)  of  the  profiles  in 
(a)  (solid  lines).  The  dashed  line  is  a  polynomial  approximation  to  the  measured 
data  points  idl(ft0  =  ft)  (denoted  by  circles)  reproduced  from  Fig.  3.27  in  [ Hillier , 
1991].  The  detection  threshold  K(£l0)  reflects  the  shape  of  the  perceptual  threshold 
idl(ft0  =  ft).  Maxima  of  r(ft0)  are  at  their  just  detectable  levels. 
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II.  DETECTION  PROCEDURES 


Various  detection  procedures  are  developed  in  this  section  so  as  to  predict  per¬ 
ceptual  thresholds  from  the  ripple  transform  r(-).  Two  types  of  profile  analysis 
paradigms  will  be  distinguished:  (A)  Those  in  which  the  profile  is  to  be  detected 
against  a  flat  standard,  i.e.,  the  task  is  to  detect  the  existence  of  the  profile.  Most 
profile  detection  experiments  fall  in  this  category  including  those  described  by  [Bern¬ 
stein  and  Green ,  1987;  Bernstein,  Richards  and  Green ,  1987]  and  by  [Green,  1986; 
Hillier ,  1991].  The  latter  are  the  ripple  detection  experiments  called  here  ripple 
intensity-difference-limen  experiments  or  ripple-idl);  (B)  Those  for  which  the  stan¬ 
dard  is  not  flat.  Instead,  the  subject  is  to  detect  a  change  in  some  parameter  of  an 
audible  profile,  for  instance,  in  the  frequency  [Hillier,  1991]  or  phase  of  a  ripple,  or, 
in  the  height  of  a  pedestal  profile  [Green,  1988]. 

A.  Detection  procedures  for  tests  with  flat  standards 

In  such  tests,  the  amplitude  of  the  profile  p(u>)  is  gradually  increased  until  detec¬ 
tion  occurs.  Therefore,  in  the  context  of  the  ripple  analysis  model,  detection  occurs 
when  the  magnitude  of  the  ripple  transform  r(fiD)  (according  to  Eq.  (4))  exceeds 
some  perceptual  threshold  level  K(fl0)  (Fig.  2(c)).  In  order  to  determine  K(fl0),  we 
use  the  results  of  the  ripple-idl  threshold  measurements  reported  in  [Hillier,  1991]. 
Figure  2(c)  illustrates  r(fi0)  for  two  just  detectable  ripples  fli  and  fl2  with  ampli¬ 
tudes  0.1.  In  order  to  obtain  the  same  detection  results  from  the  model,  we  define 
K(tt0)  =  idl(fl0  =  U)  as  shown  in  Fig.  2(c)2. 

In  Sec.  Ill,  we  compute  r(fla)  at  perceptual  thresholds  for  several  profiles  and 
compare  them  to  K(fi0),  in  order  to  evaluate  the  performance  of  the  model. 

B.  Detection  procedures  for  tests  with  non-flat  standards 

Tests  with  non-flat  standards  involve  threshold  measurements  of  a  parameter 
change  in  an  audible  profile.  Detection  procedures  for  three  types  of  such  tests 
are  considered  in  this  paper:  (1)  Dilation  of  a  profile,  exemplified  by  frequency  - 
difference-limen  (fdl)  measurements  for  ripples  profiles  [Hillier,  1991];  (2)  Ripple 
phase  shift,  such  as  the  phase-diflerence-limen  (pdl)  measurements  for  ripple  profiles 
(Sec.  II  B.2);  (3)  Change  in  overall  amplitude,  e.g.,  the  pedestal-type  experiments 
with  single  increment  profiles  [Green,  1988].  In  relation  to  the  ripple  transform 
r(0o,  $0)  in  Eq.  (3),  these  three  tests  correspond,  respectively,  to  a  translation  in 
the  magnitude  of  the  ripple  transform  r(fi0),  a  translation  in  the  phase  <fro  —  0o,  and 
a  change  in  the  amplitude  of  the  ripple  transform. 

2This  detection  procedure  is  only  the  simplest  of  many  possible  schemes.  For  instance,  one  may 
assume  K  to  be  constant  and  instead  weight  the  input  profile  or  the  filter  heights  by  the  inverse 
of  the  idl  bowl  [Hillier,  1991].  While  these  procedures  are  equivalent  with  respect  to  the  single 
ripple  idl’s,  they  generally  have  different  consequences  for  arbitrary  input  profiles.  In  the  absence 
of  additional  supporting  data,  we  adopt  the  simplest  approach  taking  I\  to  be  a  function  of  as 
shown  in  Fig.  2(c). 
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1.  Detection  of  a  profile  dilation 

As  discussed  earlier,  dilation  of  a  profile  causes  the  magnitude  of  its  ripple  trans¬ 
form  to  translate  along  the  logarithmic  Cl0  axis  without  an  overall  change  in  shape. 
In  the  ripple  analysis  model,  it  is  assumed  that  subjects  detect  this  shift  in  r(fl0). 
Since  the  shift  can  be  measured  anywhere  on  the  pattern,  e.g.,  at  its  maximum  or 
at  its  right  or  left  edges,  we  choose  to  measure  it  in  the  steepest  lowpass  edge  in 
r(0,)3. 

In  order  to  predict  the  dilation  thresholds  in  arbitrary  profiles,  we  use  the  ripple- 
fdl  measurements  of  [Hillier,  1991].  The  data  are  reproduced  by  the  dashed  line 
in  Fig.  3(c).  The  solid  curve  in  Fig.  3(c)  is  the  same  data  but  translated  by  ap¬ 
proximately  0.3  octaves  (i.e.,  log2  jjf,  where  fli  =  0.8(1  j )  to  compensate  for  the 
misalingment  between  the  location  of  the  lowpass  edge  of  r(flQ)  relative  to  the  rip¬ 
ple  frequency  fi.  The  detectable  shift  is  computed  as  A(0o)  =  log2(l  +  fdl(0.8  fi„)) 
octave,  where  fl  =  0.8fio  is  the  ripple  frequency. 

Therefore,  the  dilation  threshold  can  be  estimated  for  any  arbitrary  profile  as 
follows:  (1)  Compute  the  magnitude  of  the  ripple  transform  |r(fi0,  $0)|  of  the  profile; 
(2)  Locate  the  steepest  lowpass  edge  of  the  pattern  along  the  0o  axis;  (3)  Determine 
the  shift  A(f20)  based  on  the  solid  curve  in  Fig.  3(c)  and  the  dilation  threshold  a 
from  A  =  log2  a4. 

2.  Detection  procedure  for  phase  shift  in  the  ripple  transform 

In  this  test,  the  phase  of  the  ripple  transform,  $0  —  90,  is  translated  while  holding 
the  magnitude  r(flG)  constant  (see  Eq.  (3)).  For  an  arbitrary  profile,  threshold  is 
defined  as  the  minimum  detectable  phase  angle  added  to  all  components  of  the  ripple 
spectrum.  For  a  single  ripple  profile,  this  simply  entails  measuring  the  sensitivity 
to  a  phase  shift  in  the  profile,  i.e.,  the  phase-difference-limen  of  the  ripple  (pdl). 
Just  as  with  the  ripple-fdl  measurements  above,  the  ripple-pdl  can  be  incorporated 
in  the  model  to  predict  perceptual  thresholds  for  arbitrary  profiles.  Experiments  to 
obtain  such  pdl  data  are  briefly  described  below. 

3While  other  features  of  r(-)  may  be  equivalent,  our  choice  is  motivated  by  the  fact  that  the 
ripple  transforms  of  arbitrary  profiles  are  necessarily  bandlimited  (i.e.,  have  lowpass  edges)  but 
may  not  always  exhibit  clear  maxima. 

4There  are  more  explicit  (and  elaborate)  schemes  to  incorporate  the  fdl  curve  into  the  model. 
For  instance,  one  can  set  a  constant  detectable  shift  in  the  output  and  change  the  model  parameters 
so  as  to  require  larger  profile  dilations  to  produce  this  (constant)  output  shift  for  ripples  <  0.7 
cycle/octave  and  >  6  cycle/octave  (Fig.  3(c)).  One  way  to  accomplish  this  is  to  add  a  constant  to 
the  a  (e.g.,  <r(Q0)  =  0.3  fi0  +  0.05).  This  increases  the  relative  width  of  the  filters  substantially  in 
the  low  Q0  region  (<  1  cycle/octave)  in  effect  increasing  the  fdl’s  in  this  range  as  observed  in  the 
data.  Similarly,  the  fdl  increase  in  the  high  fi0  region  (>  6  cycle/octave)  may  be  related  to  the 
increasing  idl’s  there  (Fig.  2(c))  and,  hence,  can  be  accounted  for  by  introducing  level-sensitive 
procedures  for  the  detection  of  shifts  in  r(  ). 
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Figure  3:  (a)  Ripple  spectra  magnitudes  of  two  ripple  profiles  at  Qj  =  2  cycle/octave 
and  1^2  =  2.4  cycle/octave,  (b)  Corresponding  ripple  transform  magnitudes,  r(00). 
Both  ripples  are  well  above  their  detectable  levels  (i.e.,  r(fl0)  maxima  significantly 
exceed  K{ fiQ)).  The  two  ripple  frequencies  are  20%  apart,  which  is  the  fdl  threshold 
at  fii  =  2  cycle/octave.  This  corresponds  to  a  dilation  factor  of  a  =  1.2  or  a  r(0o) 
translation  of  A  =  log2  a  «  1/4  octave.  The  dashed  lines  denote  the  locations  of  the 
steepest  lowpass  edges  in  r( fi0)  and  fl2)-  (c)  Dashed  line  is  the  interpolated  fdl— 
threshold  of  data  (denoted  by  circles)  reproduced  from  Fig.  3.30  in  [ Hillier ,  1991]. 
The  values  are  shown  on  the  right  ordinate.  Note  that  for  the  fdl  curve,  Cl0  =  0 
denotes  the  ripple  frequency.  Solid  line  represents  the  shifts  (A)  observed  in  the 
lowpass  edges  of  r(00)  corresponding  to  the  same  fdl  data  (see  text  for  details). 
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(i)  Methods 

Sensitivity  to  ripple  phase  changes  was  measured  in  ripple  profiles  on  a  dB  am¬ 
plitude  scale  (Fig.  4(a))  and  the  pdl  thresholds  are  reported  in  the  units  of  degrees. 

Sounds  were  generated  at  25  kHz  sampling  rate  via  a  Data  Acquisition/Control 
Unit  -  HP3852A  and  two  16  bit  2-Channel  Arbitrary  Waveform  DAC  -  HP44726A. 
They  were  low-pass  filtered  at  10  kHz  and  passed  through  an  equalizer  (IEQ  One/Third 
Octave  Intelligent  Programmable)  for  level  adjustment.  Before  presentation  to  lis¬ 
teners,  sounds  were  gated  for  a  110  ms  duration  including  10  ms  rise  and  decay 
ramps.  Sounds  were  delivered  inside  an  acoustic  chamber  through  a  speaker  (ADS 
L470),  i.e.,  without  headphones. 

A  “two-alternative  two-interval”  forced  choice  adaptive  procedure  was  used  to 
estimate  the  thresholds.  Each  trial  consisted  of  two  110  ms  long  observation  intervals 
separated  by  500  ms  pause.  After  listener’s  response,  a  short  visual  feedback  was. 
provided  and  a  new  trial  started  until  all  50  trials  that  comprise  one  block  were 
presented. 

The  discrimination  task  was  to  distinguish  between  the  standard ,  which  did  not 
change  over  a  block  of  trials,  and  the  signal ,  which  resembled  the  standard  except 
for  an  adaptive  change  in  the  ripple  phase  in  each  trial.  The  step  size  was  defined 
in  terms  of  a  change  in  the  ripple  phase  and  it  varied  from  0.6°  —  4°  depending  on 
the  testing  condition. 

On  the  first  trial  the  signal  was  three  step  sizes  away  from  the  standard.  On 
each  subsequent  trial  the  signal  was  changed  according  to  the  “two  down-one  up” 
procedure  in  order  to  estimate  the  level  that  produces  70.7%  correct  answers  ([Levitt, 
1971]).  The  step  size  was  halved  after  3  reversals  and  the  threshold  was  estimated 
as  the  average  of  the  signal  across  the  last  even  number  of  reversals  excluding  the 
first  three.  Signal  and  standard  occurred  with  equal  a  priori  probability  in  one  of 
the  two  intervals. 

The  overall  presentation  level  was  randomized  across  trials  and  within  a  trial 
over  a  20  dB  range  in  1  dB  resolution,  in  order  to  ensure  that  listeners  based  their 
judgement  on  a  change  in  spectral  shape  rather  than  on  absolute  level  change  in  a 
particular  frequency  band  ([Green,  1988]). 

The  results  reported  are  based  on  data  from  two  normal  hearing  subjects.  Sub¬ 
jects  were  trained  for  about  a  week  (four  days  a  week,  60  -  90  minutes  per  day) 
before  the  actual  recording  took  place. 

('')  Stimulus 

For  all  testing  conditions  the  number  of  frequency  components  was  161  (34  per 
octave)  and  the  frequency  components  were  equally  spaced  on  a  logarithmic  scale 
between  0.2-5  kHz.  The  starting  ripple  phase  was  kept  constant  at  zero  degrees 
for  the  data  reported  here.  Other  starting  phases  were  also  tested  and  results  were 
very  similar.  The  ripple  frequency  (fl)  was  fixed  over  a  set  of  trials.  The  peak-to¬ 
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valley  ratio  was  defined  as  20  log  ^aas-,  where  amax  and  amtn  are  the  peak  and  valley 
amplitudes  of  the  profile  (Fig.  4(aJ). 

(Hi)  Results 

The  average  data  for  two  subjects  are  presented  in  Fig.  4(b)  as  a  function  of 
ripple  frequency  for  two  ripple  levels.  The  results  show  that  pdl’s  are  constant 
below  about  2  cycle/octave  at  both  levels  tested,  achieving  a  minimum  of  about  6° 
for  the  larger  level.  Phase  sensitivity  decreases  with  increasing  ripple  frequencies 
beyond  2  cycle/octave. 

Figure  4(c)  depicts  the  data  for  individual  subjects  as  a  function  of  ripple  level. 
Thresholds  saturate  with  increasing  level  at  all  ripple  frequencies  tested. 

(iv)  Discussion 

Data  in  Fig.  4(b)  suggest  that,  at  low  O’s,  subjects  detect  a  constant  phase 
shift  and  not  a  constant  displacement  of  the  peaks,  as  is  probably  the  case  for 
higher  than  2  cycle/octave  ripple  frequencies.  Since  the  slope  of  the  pdl  curves  for 
D  >  2  cycle/octave  (Fig.  4(b))  is  approximately  3.8°  per  octave,  then  the  constant 
positional  shift  can  be  estimated  to  be  approximately  0.73%  (or  0.01  octaves). 

In  summary,  given  any  arbitrary  profile  whose  ripple  transform  contains  signifi¬ 
cant  energy  below  1  cycle/octave,  the  model  predicts  a  constant  phase  shift  detec¬ 
tion  threshold  of  approximately  6°.  Thresholds  should  slowly  begin  to  increase  if 
the  ripple  transform  is  shifted  to  ripple  frequencies  greater  than  1  cycle/octave. 

3.  Detection  procedure  for  pedestal-type  experiments 

In  pedestal-type  experiments,  an  audible  profile  is  increased  in  amplitude  until 
the  change  is  detected.  Since  the  ripple  analysis  model  presented  here  is  linear,  then 
the  ripple  transform  of  the  profile  increases  proportionately  with  the  input  profile. 
In  order  to  specify  the  detection  thresholds,  it  is  necessary  to  collect  idl-like  data  for 
different  ripples  at  various  ripple  pedestals.  Such  data  are  not  available  at  present. 

As  an  approximate  measure,  one  can  utilize  the  data  from  single  component 
pedestal  experiments  [Green,  1988].  This  narrow  profile  has  a  broadband  ripple 
transform  and,  hence,  its  detection  threshold  might  be  assumed  to  be  due  to  the  most 
sensitive  ripple  components.  Thus,  one  can  at  least  predict  that  arbitrary  profiles 
with  similarly  broad  ripple  transforms  should  exhibit  comparable  thresholds.  This 
conjecture  will  be  tested  later  in  Part  II  of  this  paper  [  Vranic-Sowers  and  Shamma, 
19xx]. 

C.  Stochastic  detection  procedures 

The  filter  bank  of  the  ripple  analysis  model  can  be  viewed  as  a  set  of  independent 
channels  conveying  information  about  the  ripple  spectrum  of  the  profile  -P(D).  In 
this  sense,  it  is  analogous  to  the  classical  view  of  the  critical  band  channels  operat¬ 
ing  upon  the  spectral  profile.  Hence,  the  independent  channels  model  of  [Durlach, 
Braida  and  Ito,  1986]  and  the  more  specific  model  of  [Bernstein  and  Green ,  1987] 
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Figure  4:  (a)  A  sinusoidal  ripple  profile  with  0  =  2  cycle/octave  and  15  dB  peak- 
to-valley  amplitude  (computed  as  201og10(amoi/amin)).  Dashed  line  is  its  16°  phase 
shifted  version,  (b)  Phase  difference  limen  threshold  (pdl)  as  a  function  of  ripple 
frequency  for  15  dB  and  25  dB  peak-to- valley  amplitudes  (or  ripple  levels)  averaged 
over  2  subjects,  (c)  Individual  pdl  thresholds  at  three  ripple  frequencies  as  a  function 
of  ripple  level.  (Subject  1  was  tested  at  0.25  and  8  cycle/octave,  and  subject  2  at  2 
cycle/octave  and  at  additional  20  dB  ripple  level). 
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can  be  formally  adapted  and  applied  to  the  outputs  of  the  ripple  filter  bank.  This 
strategy  is  not  pursued  here  because  of  the  lack  of  sufficient  data  on  such  parameters 
of  the  channels  as  their  variances. 

D.  Summary  of  the  ripple  analysis  model 

The  ripple  analysis  model  consists  of  the  following  computational  steps: 

(1)  Compute  P(fi),  the  ripple  spectrum  of  the  input  profile  p{ui)  as:  P(fi)  = 

/  p(u)e~327rnwdu. 

(2)  Compute  r(fi0,4>0),  the  output  of  the  filter  bank  using  Eq.  (2)  (or,  in  the 
special  case  Eq.  (3)).  The  width  of  the  filter  H( fl;  f l0)  centered  at  f la  is  determined 
from  cr(fi0)  =  0.3  fl0. 

(3)  For  flat  standard  profile  experiments,  compare  the  magnitude  of  the  ripple 
transform  |r(fl0, 4>0)|  to  the  perceptual  threshold  curve  K(Vl0)  as  shown  in  Fig.  2(c). 

(4)  For  non-flat  standard  profile  experiments,  three  types  of  tests  are  considered 
in  the  model:  (a)  fdl -type  tests:  Threshold  is  estimated  from  A(f20)  in  Fig.  3(c), 
where  Q0  is  the  location  of  the  steepest  lowpass  edge  in  |r(fl0,  4>0)|;  (b)  pdl -type 
tests:  Threshold  is  estimated  from  the  pdl  curve  in  Fig.  4(b);  (c)  pedestal-type  ex¬ 
periments:  For  broad  profiles,  threshold  is  the  smallest  detectable  percentage  change 
in  |r(fl0, 4>0)|.  It  is  assumed  to  be  approximately  equal  to  the  single  component 
pedestal  thresholds  given  in  Fig.  5.4  in  [Green,  1988]. 

III.  PREDICTIONS  OF  THE  RIPPLE  ANALYSIS  MODEL  FOR  VAR¬ 
IOUS  INPUT  PROFILES 

In  this  section,  the  model  is  used  to  account  for  the  perceptual  thresholds  mea¬ 
sured  in  several  profile  analysis  experiments.  Since  the  ripple  profile  data  of  [Green, 
1986;  Hillier ,  1991]  have  been  used  to  estimate  the  model  parameters,  the  only  other 
reported  profile  experiments  that  we  can  consider  here  are  of  the  idl-type.  These  are 
the  detection  of  three  profiles  against  a  flat  standard:  the  alternating,  the  step,  and 
the  single  component  increment  profiles.  In  Part  II  of  this  paper,  we  shall  present 
new  data  to  test  model  predictions  in  the  three  examples  of  non-flat  standard  tests 
outlined  above. 

A.  Predicting  detection  thresholds  for  an  alternating  profile 

The  alternating  profile  [Bernstein  and  Green ,  1987]  consists  of  21  uniformly 
distributed  components  (0.2-5  kHz)  that  alternate  above  and  below  a  flat  (unit) 
base  (Figs.  5).  Thresholds  for  detection  of  such  a  profile  are  reported  at  -21.7  dB 
(=  201og(6a)  =  20  log  0.08),  where  8a  is  the  amplitude  of  an  alternating  component 
relative  to  the  unit  base.  Such  a  profile  can  be  considered  approximately  a  ripple 
at  the  highest  possible  frequency  representable  by  this  complex,  i.e.,  at  10  cycles 
per  4.64  octaves,  or  approximately  2.15  cycle/octave.  The  amplitude  8a  of  the  just 
detectable  ripple  at  this  frequency  can  be  predicted  from  I\  (fl0)  as  8a  =  2I<(no  = 
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2.15  cycle/octave)  ~  0.1  (or  -20.0  dB),  which  is  close  to  the  measured  amplitude 
(Fig.  5(c)). 
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Figure  5:  (a)  The  alternating  profile  at  threshold  amplitude  (0.08,  or  -21.7  dB).  (b) 
Ripple  spectrum  magnitude  of  the  alternating  profile  in  (a),  (c)  Ripple  transform 
magnitude  of  the  profile  in  (a).  The  detection  threshold  K(ft0)  in  (c.)  is  reached 
near  2.2  cycle/octave. 

B.  Predicting  detection  thresholds  for  a  step  profile 

The  task  in  this  experiment  is  to  detect  the  presence  of  a  (linear)  step  in 
a  21  component  flat  standard  [Bernstein  and  Green ,  1987]  (Fig.  6(a)).  For  a 
step-up  that  is  centralized  (located  at  1  kHz),  threshold  is  reached  at  -23.1  dB 
(=  20 log  8a  =  20 log  0.07),  where  6a  is  the  height  of  the  step  (relative  to  the  unit 
base).  Figure  6(b)  (solid  line)  illustrates  the  ripple  spectrum  of  this  (idealized)  pro¬ 
file.  The  corresponding  model  output  r(f 20)  is  a  constant  because  of  the  constant-Q 
property  of  the  filters5(Fig.  6(c),  solid  line).  The  predicted  threshold  is  smaller  than 
measured  (0.05  or  -26  dB).  However,  a  more  realistic  representation  of  the  step  is 
with  a  gradual  (or  ramped)  transition  because  of  cochlear  filter  smoothing  (dashed 
lines  in  Figs.  6).  The  smoothing  of  the  ideal  profile  lowers  the  P(£2)  (Fig.  6(b)) 
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and  the  corresponding  r(Cl0)  is  more  lowpass  filtered  and  just  detectable  near  2 
cycle/octave.  Note,  however,  that  the  model  predicts  that  more  heavily  smoothed 
step  profiles  should  become  less  detectable. 

Since  the  phase  of  the  ripple  spectrum  does  not  play  a  role  here,  predicted 
thresholds  remain  the  same  for  the  reversed  (step-down)  profile  as  is  indeed  mea¬ 
sured.  Finally,  the  simplified  model  cannot  account  for  the  rise  in  thresholds  as  the 
step  is  moved  towards  the  edges  of  the  spectrum  [Bernstein  and  Green ,  1987].  It 
may  be  possible,  however,  to  account  for  this  trend  by  including  the  effects  of  the 
base  edge  in  p(u)  and  by  using  the  complete  model  (i.e.,  Eq.  (2)). 

C.  Predicting  detection  thresholds  for  a  single  component  increment  pro¬ 
file 

In  this  experiment,  a  single  component  in  the  profile  is  incremented  relative 
to  the  base  [Green,  1988]  (Fig.  7(a)).  The  threshold  is  approximately  -20.1  dB 
(=  20 log  8a  =  20 log  0.09),  where  8a  is  the  height  of  component  relative  to  the 
(unit)  base.  In  order  to  apply  this  profile  to  the  ripple  analysis  model,  it  is  assumed 
that  the  finite  bandwidth  of  the  cochlear  filters  broadens  the  impulse-like  profile, 
making  it  appear  as  a  narrow  symmetric  peak  profile,  e.g.,  with  approximately  0.1 
octave  bandwidth  (measured  at  the  3  dB  points)  with  same  height  as  before  (= 
0.09).  Figure  7(c)  illustrates  that  for  such  a  peak  the  corresponding  output  r(fi0) 
just  reaches  perceptual  threshold  K(Sl0)  near  0o  =  2.3  cycle/octave.  Note  that, 
approximating  the  increment  by  a  peak  with  slightly  different  bandwidths  causes 
small  shifts  in  the  broad  r(0o)  without  affecting  the  above  estimated  thresholds 
significantly. 

IV.  DISCUSSION 


A.  Summary  of  the  ripple  analysis  model  and  underlying  assumptions 
A  simplified  ripple  analysis  model  is  presented  to  integrate  findings  from  various 
profile  analysis  experiments.  The  basic  operation  implied  by  the  model  is  a  transfor¬ 
mation  of  the  profile  into  its  ripple  transform  domain.  Various  manipulations  on  the 
profile  are  then  interpreted  and  detected  in  this  domain.  Two  sets  of  assumptions 
underlie  the  model:  the  nature  and  linearity  of  the  input  representation  and  the 
parameters  of  the  ripple  analysis  filters. 

5This  assertion  can  be  verified  as  follows.  Consider  a  constant-Q  filter  (<r(f}0)  =  <rre;  Q0) .  For 
a  step  profile  input  (with  magnitude  of  ripple  spectrum  |.P(fi)|  =  1/|Q|),  the  filter  outputs  r(  )  are 
given  by: 


W-n.)a 

3«-3(n<.)  dfi  =  2 


(I-O/Oo)3 

dtl=  2 


roo 

Jt  & 


^73 


dQ' 


where  fl'  =  and  e  is  a  small  positive  number.  Therefore,  r(  )  is  a  function  of  crrei  only  and, 

specifically,  it  is  independent  of 
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to,  octave 


D.o,  cycle/octave 


Figure  6:  (a)  Profile  of  a  step  function  (solid  line)  at  threshold  amplitude  (0.07,  or  - 
23.1  dB)  and  its  smoothed  version  (dashed  line).  (The  smoothed  version  is  obtained 
by  convolving  the  step  with  the  narrow  symmetric  peak  profile  of  bandwidth  0.1 
octave  and  amplitude  -30  dB  with  respect  to  the  base).  Magnitudes  of  (b)  ripple 
spectra  and  (c)  ripple  transforms  corresponding  to  the  idealized  (solid  lines)  and 
smoothed  (dashed  lines)  step.  The  ripple  spectrum  magnitude  of  the  idealized  step 
is  computed  as:  |P(fi)|  =  2  •  Sa/\j  2ir£l  M |,  where  8  a  is  the  amplitude  at  threshold. 
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Figure  7:  (a)  A  smoothed  version  of  a  single  increment  profile  on  a  flat  base.  The 
amplitude  is  set  at  its  perceptual  threshold  (0.09,  or  -20.1  dB).  The  single  increment 
is  approximated  with  the  narrow  symmetric  peak  of  0.1  octave  bandwidth  and  -20.1 
dB  amplitude.  The  ripple  spectrum  and  ripple  transform  magnitudes  are  shown  in 
(b)  and  (c),  respectively. 
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1.  The  representation  and  linearity  of  the  input  profiles 

It  is  assumed  in  this  model  that  the  auditory  system  analyzes  the  amplitude 
spectrum  on  a  linear,  rather  than  on  a  logarithmic,  scale.  Neither  is  known  to 
be  the  true  auditory  representation  and  other  representations  such  as  the  power 
spectrum  or  some  output  of  a  cochlear  filter  model  might  be  more  appropriate.  The 
effects  of  using  a  distorted  representation  are  minimal  in  the  cases  examined  in  this 
paper  because  it  usually  creates  distortion  components  of  smaller  amplitudes  that, 
for  idl  tests,  are  effectively  sub-threshold  at  their  corresponding  filters. 

The  exact  nature  of  the  input  profile  is  more  consequential  in  cases  where  met¬ 
rics  between  different  complex  profiles  are  considered  (see  discussion  later)  or  when 
profiles  are  added  linearly.  This  brings  up  a  fundamental  assumption  of  the  ripple 
analysis  model,  that  the  auditory  system  analyzes  linearly  a  profile  in  terms  of  rip¬ 
ples.  How  does  the  cochlea  with  all  of  its  nonlinearities  preserve  the  principle  of 
superposition  of  spectral  ripples?  And  if  not,  in  what  form  is  the  linearity  preserved 
so  as  to  permit  this  type  of  ripple  analysis?  Hillier  attempted  to  address  this  issue 
using  adaptation  experiments  [ Hillier ,  1991].  Recent  models  of  cochlear  processing 
have  also  tackled  this  question  [Wang  and  Shamma,  1994],  However,  the  valida¬ 
tion  of  the  linearity  hypothesis  must  await  direct  tests  from  psychoacoustical  and 
neurophysiological  experiments  in  search  of  systematic  interactions  among  a  small 
number  of  simultaneously  presented  ripples. 

2.  The  parameters  of  the  ripple  analysis  filters 

The  filters  determine  the  shape  of  the  ripple  transform  r(Q0,  $0)  and,  hence, 
the  interpretation  of  the  results.  The  choices  made  here  regarding  the  parameters 
and  shape  of  these  filters  satisfy  two  basic  experimental  findings  reported  in  [ Hillier , 
1991]  (Secs.  4.4  and  4.5)  and  in  psychophysical  experiments  in  vision  using  analogous 
stimuli  (summarized  in  [De  Valois  and  De  Valois,  1990]).  These  are:  (1)  the  filters 
are  roughly  of  a  constant-Q  factor,  and  (2)  their  width  is  approximately  1  octave 
(i.e.,  cr(f20)  =  0.3  fi0)-  It  might  be  argued  that  other  details  of  the  filter  shapes  (such 
as  their  heights  and  form)  can  be  inferred  from  the  idl  and  fdl  measurements.  Such 
an  inference,  however,  is  uncertain  as  discussed  in  Sec.  II  since  other  parameters  can 
be  readily  adjusted  with  similar  effects  on  the  model  outputs.  To  avoid  making  such 
specific  commitments  in  the  model,  the  filters  are  defined  in  as  general  and  simple 
a  form  as  possible. 

B.  Distinguishing  the  ripple  analysis  model  from  other  profile  analysis 
models 

The  computations  outlined  in  this  paper  served  to  illustrate  the  competence  of 
the  ripple  analysis  model  in  accounting  for  several  profile  analysis  measurements. 
However,  other  models  such  as  the  independent  channels  models  [Durlach,  Braida 
and  Ito ,  1986]  and  the  maximum  difference  model  [Bernstein  and  Green,  1987]  can 
account  for  a  significant  portion  of  the  same  data.  It  is,  therefore,  important  to 
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come  up  with  specific  tests  that  can  distinguish  these  models.  Two  such  tests  follow 
from  the  fundamental  predictions  which  emerge  from  the  ripple  analysis  model: 

Consider  any  arbitrary  spectral  profile  whose  ripple  transform  magnitude  |r(fl0,$0)| 
is  large  relative  to  the  perceptual  threshold  K(0,o).  Then: 

(1)  If  \r(-)\  has  a  lowpass  edge  approximately  in  the  range  0.85-1  cycle/octave, 
dilation  thresholds  are  predicted  to  be  constant  at  20-30%. 

(2)  If\r(-)\  contains  at  least  some  large  low  frequency  ripples  (<  1  cycle/ octave), 
phase  sensitivity  is  expected  to  be  constant  at  approximately  6° . 

Both  of  these  predictions  are  unintuitive  and,  hence,  their  confirmation  reflects 
well  on  the  model.  They  are  directly  tested  in  the  companion  paper  [  Vranic-Sowers 
and  Shamma,  19xx],  where  dilation  and  phase  thresholds  are  measured  for  a  complex 
(i.e.,  not  a  single  ripple)  profile. 

C.  Relevance  to  timbre  perception 

So  far  we  have  focused  on  the  model  output  around  the  center  of  a  profile  at  lo0. 

In  general,  for  an  arbitrary  profile  p(u>)  the  simplifications  leading  to  Eq.  (4)  do  not 
necessarily  apply.  In  that  case,  a  spectral  profile  should  be  expanded  along  all  three 
independent  axes:  fi0,  and  ui0. 

An  important  aspect  of  the  complete  representation  is  its  locality  with  respect 
to  the  tonotopic  axis.  This  is  analogous  to  the  locality  in  time  of  a  spectrogram 
of  running  speech.  Computationally,  the  locality  of  the  ripple  analysis  is  implied 
by  the  relatively  broad  bandwidths  of  the  ripple  filters  or,  equivalently,  the  limited 
extent  of  the  impulse  response  of  the  filters  (Fig.  1(a)). 

Since  changes  along  any  of  the  three  axes  are  perceptually  detectable,  then  com¬ 
paring  two  arbitrary  profiles  must  be  done  along  all  three  dimensions  of  the  ripple 
representation.  Metrics  based  on  this  representation  (e.g.,  simple  Eucledian  dis¬ 
tance)  might  be  considerably  simpler  than  other  metrics  (based  on  the  spectral 
domain  representation),  since  other  metrics  often  imply  “conditioning”  operations 
which  are  included  in  the  ripple  transform  representation.  For  example,  the  metric 
suggested  by  [Assmann  and  Summerfield,  1989]  applies  (among  other  things)  a  sec¬ 
ond  derivative  upon  the  profile,  effectively  de-emphasizing  the  slow  variations  (or, 
equivalently,  the  low  frequency  ripples)  of  the  profile.  Such  an  operation  is  implied 
in  the  model  by  the  “highpass”  (left)  edge  of  the  idl  curve. 

D.  Relation  to  rippled  noise  stimuli  and  the  pitch  of  complex  sounds 

A  different  rippled  spectrum  stimulus  that  has  been  widely  used  in  studies  of 
pitch  perception  is  the  so-called  rippled  noise  [Yost  and  Hill,  1979].  It  has  a  si¬ 
nusoidal  spectral  envelope  defined  against  a  linear,  rather  than  a  logarithmic  fre¬ 
quency,  axis,  i.e.,  is  similar  to  a  harmonic  spectrum.  On  a  logarithmic  frequency 
axis,  however,  a  harmonic  spectrum  appears  to  have  an  exponentially  increasing 
ripple  frequency. 

The  representation  of  harmonic  spectra  in  the  ripple  analysis  model  leads  to 
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many  interesting  hypotheses  regarding  the  encoding  of  complex  pitch  in  the  audi¬ 
tory  system.  Of  immediate  relevance  to  the  focus  of  this  paper,  however,  is  the 
interpretation  of  the  “dominance  region”  in  pitch  models.  Specifically,  it  has  long 
been  known  that  the  2nd ,  3rd ,  and  4th  harmonics  in  a  series  are  dominant  in  conveying 
the  pitch  of  the  complex  [ Plomp ,  1976].  From  a  computational  point  of  view,  pitch 
models  have  accounted  for  this  phenomenon  by  emphasizing  (or  weighting  more 
heavily)  these  regions  of  the  spectral  profile  prior  to  estimating  the  pitch  strength 
and  value  [  Yost  and  Hill,  1979]. 

The  dominance  region  can  be  viewed  in  the  context  of  the  ripple  analysis  model 
as  a  correlate  of  the  ripple-idl  sensitivity  curve  (Fig.  2(c)),  which  has  its  lowest 
thresholds  for  ripples  around  2  cycle/octave.  In  a  harmonic  spectrum  defined  against 
a  logarithmic  axis,  the  spectral  profile  around  the  2nd  —  4th  harmonics  has  the  same 
ripple  frequencies.  Thus,  the  emergence  of  the  ripple-idl  curve  may  share  the  same 
origins  as  those  responsible  for  the  dominance  region,  namely  a  combination  of 
suppressive  and  other  interactions  at  relatively  peripheral  stages  of  the  auditory 
system  [Bilsen  et  al.,  1975;  Wang  and  Shamma ,  1994;  Yost  and  Hill ,  1979]. 

E.  Relation  to  visual  processing 

An  appealing  aspect  of  the  ripple  analysis  model  is  that  it  shares  the  conceptual 
framework  of  spatial  frequency  analysis  that  has  long  been  prevalent  in  visual  pro¬ 
cessing.  While  having  its  critics  (see  reviews  in  [DeValois  and  DeValois ,  1990]),  this 
approach  has  been  supported  by  substantial  anatomical,  neurophysiological,  and 
psychophysical  evidence,  elegantly  detailed  in  [DeValois  and  DeValois ,  1990].  Inter¬ 
estingly,  in  the  vision  community,  the  idea  that  the  brain  performs  a  local  Fourier 
transformation  is  motivated  by  its  similarity  to  the  cochlear  transformations  of  the 
auditory  system!  From  the  perspective  of  the  auditory  system,  however,  the  cochlea 
and  early  auditory  stages  simply  transform  a  sound  into  an  input  spectral  profile. 
The  auditory  nervous  system  then  treats  this  profile  the  same  way  the  visual  system 
treats  its  retinal  image. 

The  notion  of  a  Fourier  transformation  of  a  spectrum  is  common  in  engineering 
speech  applications  and  is  known  as  convolutional  homomorphic  processing.  It  in¬ 
volves  computing  Fourier-like  coefficients  of  the  spectral  profile  known  as  cepstral 
coefficients  [Oppenheim  and  Schafer,  1990].  While  quite  different  in  details,  the 
cepstral  coefficients  encode  roughly  similar  types  of  information  about  the  shape 
of  the  spectrum  as  the  ripple  transform,  and  have  been  found  especially  useful  in 
automatic  speech  recognition  systems. 

Finally,  the  correspondence  between  the  auditory  ripple  analysis  and  the  visual 
spatial  frequency  analysis  goes  deeper  than  a  mere  analogy.  As  evidence  to  this 
claim,  consider  the  closely  matched  values  of  the  filter  parameters  and  detection 
thresholds  measured  in  the  auditory  and  visual  systems,  e.g.,  roughly  constant-^ 
and  1  octave  wide  filters,  and  the  constant  6°  phase  sensitivity  which  increases  at 
higher  ripple  frequencies  (Table  6.1  and  Figs.  6.11  and  8.3  in  [DeValois  and  DeVal- 
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ois,  1990]).  These  remarkable  equivalences  may  simply  reflect  modality-independent 
limitations  imposed  by  identically  structured  sensory  areas  in  the  central  nervous 
system.  For  instance,  the  resolution  of  the  analysis  filters  may  be  dictated  by  devel¬ 
opmental  rules  limiting  the  minimum  divergence  or  convergence  of  dendritic  fields 
along  the  sensory  epithelium.  Clearly,  exploring  further  equivalences  and  differ¬ 
ences  between  similarly  defined  psychophysical  measures,  e.g.,  fdl’s  for  ripples  ver¬ 
sus  gratings  (which  apparently  have  not  been  reported  in  the  literature),  would  shed 
considerable  light  on  the  underlying  functional  organization  of  both  systems. 
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